منابع مشابه
Text Segmentation by Language Using Minimum Description Length
The problem addressed in this paper is to segment a given multilingual document into segments for each language and then identify the language of each segment. The problem was motivated by an attempt to collect a large amount of linguistic data for non-major languages from the web. The problem is formulated in terms of obtaining the minimum description length of a text, and the proposed solutio...
متن کاملMultiple text segmentation for statistical language modeling
In this article we deal with the text segmentation problem in statistical language modeling for under-resourced languages with a writing system without word boundary delimiters. While the lack of text resources has a negative impact on the performance of language models, the errors introduced by the automatic word segmentation makes those data even less usable. To better exploit the text resour...
متن کاملText Segmentation by
We investigate the problem of text segmentation by topic. Applications for this task include topic tracking of broadcast speech data and topic identiication in full-text databases. Researchers have tackled similar problems before but with diierent goals. This study focuses on data with relatively small segment sizes and for which within-segment sentences have relatively few words in common maki...
متن کاملUnsupervised Text Segmentation Based on Native Language Characteristics
Most work on segmenting text does so on the basis of topic changes, but it can be of interest to segment by other, stylistically expressed characteristics such as change of authorship or native language. We propose a Bayesian unsupervised text segmentation approach to the latter. While baseline models achieve essentially random segmentation on our task, indicating its difficulty, a Bayesian mod...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Sistemas y Telemática
سال: 2016
ISSN: 1692-5238
DOI: 10.18046/syt.v14i38.2289